Final Project Proposal

finalpart1
Initial proposal for my final project
Author

Lindsay Jones

Published

October 7, 2022

Code
library(tidyverse)
── Attaching packages ─────────────────────────────────────── tidyverse 1.3.2 ──
✔ ggplot2 3.3.6      ✔ purrr   0.3.4 
✔ tibble  3.1.8      ✔ dplyr   1.0.10
✔ tidyr   1.2.1      ✔ stringr 1.4.1 
✔ readr   2.1.2      ✔ forcats 0.5.2 
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
Code
library(dplyr)

Part 1

Research Question

In the United States, wage stagnation has become a hot-button issue for many people in various fields of employment. Graduate students have been at the center of this issue in recent years- strikes for wage increases and cost-of-living adjustments have taken place at multiple universities throughout the country. Because PhD students often do not have the time to earn extra income (and their contracts often prohibit them from pursuing work elsewhere), how much they will earn from their stipend is a huge factor in considering where to pursue their research (Powell, 2004; Soar et al., 2022). Knowing how much My research question is: What is the strongest predictor of the value of a PhD stipend?

Hypothesis

H₀: Cost of living is not the strongest predictor of the value of a PhD stipend.

H₁: Cost of living is the strongest predictor of the value of a PhD stipend.

Dataset

This dataset is comprised of self-reported survey data collected by PhDStipends.com. Respondents are asked their university, department, academic year, and year in the program. They are also asked whether they receive a 12-month or 9-month salary, gross pay, and required fees. PhDStipends automatically calculators the LW Ratio (living wage ratio), which is the stipend divided by the living wage of the country the university is located in. I will likely need to add additional information for my own analysis.

The variables of interest for me are the university, department, and program year.

Code
library(readr)
csv <- read_csv("~/School/UMASS/DACSS 603/Final Project/csv.csv")
Rows: 12160 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (5): University, Status, Department, Category, AcYear
dbl (7): Pay, LW Ratio, ProgYear, 12 M Gross Pay, 9 M Gross Pay, 3 M Gross P...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
summary(csv)
  University           Status           Department          Category        
 Length:12160       Length:12160       Length:12160       Length:12160      
 Class :character   Class :character   Class :character   Class :character  
 Mode  :character   Mode  :character   Mode  :character   Mode  :character  
                                                                            
                                                                            
                                                                            
                                                                            
      Pay           LW Ratio        AcYear             ProgYear   
 Min.   :    1   Min.   :0.000   Length:12160       Min.   :1.00  
 1st Qu.:20000   1st Qu.:0.880   Class :character   1st Qu.:1.00  
 Median :26000   Median :1.130   Mode  :character   Median :1.00  
 Mean   :25765   Mean   :1.095                      Mean   :2.05  
 3rd Qu.:31500   3rd Qu.:1.330                      3rd Qu.:3.00  
 Max.   :96000   Max.   :4.120                      Max.   :6.00  
 NA's   :47      NA's   :422                        NA's   :1221  
 12 M Gross Pay   9 M Gross Pay   3 M Gross Pay        Fees      
 Min.   :     1   Min.   :   15   Min.   :    4   Min.   :    1  
 1st Qu.: 24000   1st Qu.:16500   1st Qu.: 3000   1st Qu.:  500  
 Median : 29000   Median :20000   Median : 5000   Median : 1000  
 Mean   : 28474   Mean   :20128   Mean   : 5194   Mean   : 2030  
 3rd Qu.: 33000   3rd Qu.:24000   3rd Qu.: 6204   3rd Qu.: 2000  
 Max.   :140000   Max.   :87467   Max.   :55816   Max.   :93725  
 NA's   :3632     NA's   :8551    NA's   :10951   NA's   :7404   
Code
print(summarytools::dfSummary(csv,
                              varnumbers = FALSE,
                              plain.ascii  = FALSE,
                              style        = "grid",
                              graph.magnif = 0.70,
                              valid.col    = FALSE),
      method = 'render',
      table.classes = 'table-condensed')

Data Frame Summary

csv

Dimensions: 12160 x 12
Duplicates: 339
Variable Stats / Values Freqs (% of Valid) Graph Missing
University [character]
1. University of Wisconsin -
2. Duke University (DU)
3. University of North Carol
4. University of California
5. University of California,
6. University of Michigan -
7. University of Pennsylvani
8. University of Southern Ca
9. Pennsylvania State Univer
10. University of Minnesota -
[ 390 others ]
230(1.9%)
208(1.7%)
206(1.7%)
205(1.7%)
204(1.7%)
195(1.6%)
193(1.6%)
191(1.6%)
190(1.6%)
179(1.5%)
10159(83.5%)
0 (0.0%)
Status [character]
1. Private
2. Public
4236(34.8%)
7924(65.2%)
0 (0.0%)
Department [character]
1. Chemistry
2. Psychology
3. Sociology
4. Computer Science
5. Physics
6. English
7. Political Science
8. Biology
9. Economics
10. Biomedical Engineering
[ 2916 others ]
530(4.5%)
391(3.3%)
323(2.7%)
322(2.7%)
292(2.5%)
289(2.4%)
286(2.4%)
266(2.2%)
197(1.7%)
196(1.7%)
8747(73.9%)
321 (2.6%)
Category [character]
1. #N/A
2. 0
3. Business/Policy
4. Formal Science
5. Humanities
6. Natural Science
7. Social Science
3310(27.2%)
625(5.1%)
211(1.7%)
1658(13.6%)
919(7.6%)
3435(28.2%)
2002(16.5%)
0 (0.0%)
Pay [numeric]
Mean (sd) : 25765.1 (9125.4)
min ≤ med ≤ max:
1 ≤ 26000 ≤ 96000
IQR (CV) : 11500 (0.4)
3420 distinct values 47 (0.4%)
LW Ratio [numeric]
Mean (sd) : 1.1 (0.4)
min ≤ med ≤ max:
0 ≤ 1.1 ≤ 4.1
IQR (CV) : 0.5 (0.3)
253 distinct values 422 (3.5%)
AcYear [character]
1. 2020-2021
2. 2016-2017
3. 2018-2019
4. 2019-2020
5. 2021-2022
6. 2017-2018
7. 2022-2023
8. 2014-2015
9. 2015-2016
10. 2013-2014
[ 14 others ]
2657(21.9%)
1959(16.1%)
1708(14.0%)
1347(11.1%)
1194(9.8%)
1111(9.1%)
998(8.2%)
524(4.3%)
395(3.2%)
90(0.7%)
175(1.4%)
2 (0.0%)
ProgYear [numeric]
Mean (sd) : 2 (1.5)
min ≤ med ≤ max:
1 ≤ 1 ≤ 6
IQR (CV) : 2 (0.7)
1:6185(56.5%)
2:1518(13.9%)
3:1191(10.9%)
4:951(8.7%)
5:740(6.8%)
6:354(3.2%)
1221 (10.0%)
12 M Gross Pay [numeric]
Mean (sd) : 28473.9 (9013.8)
min ≤ med ≤ max:
1 ≤ 29000 ≤ 140000
IQR (CV) : 9000 (0.3)
1608 distinct values 3632 (29.9%)
9 M Gross Pay [numeric]
Mean (sd) : 20128.2 (7100.4)
min ≤ med ≤ max:
15 ≤ 20000 ≤ 87467
IQR (CV) : 7500 (0.4)
1046 distinct values 8551 (70.3%)
3 M Gross Pay [numeric]
Mean (sd) : 5194.4 (3370.8)
min ≤ med ≤ max:
4 ≤ 5000 ≤ 55816
IQR (CV) : 3204 (0.6)
308 distinct values 10951 (90.1%)
Fees [numeric]
Mean (sd) : 2030.1 (4711.9)
min ≤ med ≤ max:
1 ≤ 1000 ≤ 93725
IQR (CV) : 1500 (2.3)
985 distinct values 7404 (60.9%)

Generated by summarytools 1.0.1 (R version 4.2.1)
2022-11-13

Based on this summary, there are some extreme outliers in need of removal, particularly in the Overall Pay column. Interesting, the mean Overall Pay of $27549.4 does not seem unreasonable,.

Part 2

References

Living Wage Calculator. (n.d.). Retrieved October 10, 2022, from https://livingwage.mit.edu/

Powell, K. Stipend survival. Nature 428, 102–103 (2004). https://doi.org/10.1038/nj6978-102a

Emily Roberts & Kyle Roberts. (2022, October 10). PhD stipends Dataset. http://www.phdstipends.com/csv

Soar, M., Stewart, L., Nissen, S. et al. Sweat Equity: Student Scholarships in Aotearoa New Zealand’s Universities. NZ J Educ Stud (2022). https://doi.org/10.1007/s40841-022-00244-5